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Abstract 



We reconsider stochastic convergence analyses of particle swarm op- 
I timisation, and point out that previously obtained parameter conditions 

^ are not always sufficient to guarantee mean square convergence to a lo- 

(*^-) cal optimum. We show that stagnation can in fact occur for non-trivial 

configurations in non-optimal parts of the search space, even for simple 
functions like Sphere. The convergence properties of the basic PSO may 
in these situations be detrimental to the goal of optimisation, to discover 
a sufficiently good solution within reasonable time. To characterise opti- 
C""*) misation ability of algorithms, we suggest the expected first hitting time 

I (FHT), i.e., the time until a search point in the vicinity of the optimum 

t-H is visited. It is shown that a basic PSO may have infinite expected FHT, 

while an algorithm introduced here, the Noisy PSO, has finite expected 
FHT on some functions. 



1 Introduction 

Particle Swarm Optimisation (PSO) is an optimisation technique for functions 
over continuous spaces introduced by Kennedy and Eberhart [TO]. The algo- 
rithm simulates the motions of a swarm of particles in the solution space. While 
limited by inertia, each particle is subject to two attracting forces, towards the 
best position P t visited by the particle, and towards the best position G t visited 
by any particle in the swarm. The update equations for the velocity Vt and 
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the position X t are given in Algorithm [T] in Section [2] The inertia factor cu, 
and the acceleration coefficients ipi and ip 2 are user-specified parameters. The 
algorithm only uses objective function values when updating G and P, and does 
not require any gradient information. So the PSO is a black-box algorithm [3]. 
It is straightforward to implement and has been applied successfully in many 
optimisation domains. Despite its popularity, the theoretical understanding of 
the PSO remains limited. In particular, how do the parameter settings influence 
the swarm dynamics, and in the end, the performance of the PSO? 

One of the best understood aspects of the PSO dynamics are the conditions 
under which the swarm stagnates into an equilibrium point. It is not too difficult 
to see (e.g., [3]) that velocity explosion can only be avoided when the inertia 
factor is bounded by 

M < i- (i) 

The magnitude of the velocities still depends heavily on how the global G t 
and local best Pt positions evolve with time t, which again is influenced by 
the function that is optimised. To simplify the matters, it has generally been 
assumed that the swarm has entered a stagnation mode, where the global and 
local best particle positions Gt and Pt remain fixed. Under this assumption, 
there is no interaction between the particles, or between the problem dimensions, 
and the function to be optimised is irrelevant. The swarm can therefore be 
understood as a set of independent, one-dimensional processes. 

An additional simplifying assumption made in early convergence analyses 
was to disregard the stochastic factors R and S, replacing them by constants 
[T51 13] . Trelea [2] analysed the 1-dimensional dynamics under this assumption, 
showing that convergence to the equilibrium point 

P e = (^ 1 P + ip 2 G)/(ip 1 +i P2 ) (2) 

occurs under condition ([I]) and 

< ipi +<f 2 < 4(1 +u>). (3) 

Kadirkamanathan et al. [5] were among the first to take the stochastic effects 
into account, approaching the dynamics of the global best particle position (for 
which P — G) from a control-theoretic angle. In particular, they considered 
asymptotic Lyapunov stability of the global best position, still under the as- 
sumption of fixed P and G. Informally, this stability condition is satisfied if the 
global best particle always converges to the global best position when started 
nearby it. Assuming a global best position in the origin, their analysis shows 
that condition Jl]), u ^ 0, and 

tpi+ip 2 <2(l-2M+w 2 )/(l+u;) (4) 

are sufficient to guarantee asymptotic Lyapunov stability of the origin. These 
conditions are not necessary, and are conservative. Another stochastic mode 
of convergence considered, is convergence in mean square (also called second 
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order stability) to a point x*, denned as lim^oo E (jX t — x* | 2 ) = 0. Mean 
square convergence to x* implies that the expectation of the particle position 
converges to x* , while its variance converges to 0. It has been claimed that 
all particles in the PSO converges in mean square to the global best position 
if the parameter triplet u, tpi,cp2 is set appropriately. Jiang et al. derived 
recurrence equations for the sequences E (X t ) and Var(X t ) assuming fixed G 
and P, and determined conditions, i. e. a convergence region, under which these 
sequences are convergent. The convergence region considered in [8] is strictly 
contained in the convergence region given by the deterministic condition ([3| . For 
positive ui, the Lyapunov stability region described by condition Q is strictly 
contained in the mean square stability region. Given the conditions indicated 
in Figure [l] the expectation will converge to P e (as in Eq. p])), while the 
variance will converge to a value which is proportional to (G — P). It is claimed 
that the local best P converges to G, which would imply that the variance 
converges to 0. However, as we will explain in later sections, this is not generally 
correct. We will discuss further assumptions that are needed to fix the claim 
of [5]. Wakasa et al. [IH] pointed out an alternative technique for determining 
mean square stability of specific parameter triplets. They showed that this 
problem, and other problems related to the the PSO dynamics, can be reduced 
to checking the existence of a matrix satisfying an associated linear matrix 
inequality (LMI) . This is a standard approach in control theory, and is popular 
because the reduced LMI problem can be solved efficiently using convex- and 
quasi-convex optimisation techniques HJ. Wakasa et al. [TS] obtained explicit 
expressions for the mean square stability region, identical to the stability region 
obtained in [8], using this technique. Assuming stagnation, Poli [12] provided 
recurrence equations for higher moments (e. g. skewness and kurtosis) of the 
particle distribution. The equations for the m-th moment are expressed with 
an exponential number of terms in m, but can be solved using computer algebra 
systems for not too high moments. 

Recently, there has been progress in removing the stagnation assumption on 
P and G. Building on previous work by Brandstatter and Baumgartner [2J, 
Fernandez-Martinez and Garcia-Gonzalo [5] interpret the PSO dynamics as a 
discrete-time approximation of a certain spring-mass system. From this mechan- 
ical interpretation follows naturally a generalisation of the PSO with adjustable 
time step At, where the special case At = 1 corresponds to the standard PSO. 
In the limit where At — > 0, one obtains a continuous-time PSO governed by 
stochastic differential equations. They show that dynamic properties of the 
discrete-time PSO approach those of the continuous-time PSO when the time 
step approaches 0. 

While theoretical research on PSO has mainly focused on convergence, there 
may be other theoretical properties that are more relevant in the context of 
optimisation. The primary goal in optimisation is to obtain a solution of ac- 
ceptable quality within reasonable time. Convergence may be neither sufficient, 
nor necessary to reach this goal. In particular, convergence is insufficient when 
stagnation occurs at non-optimal points in the solution space. Furthermore, 
stagnation is not necessary when a solution of acceptable quality has been found. 
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Figure 1: Comparison of convergence regions. Noisy PSO indicates when the 
precondition /(l) > 1/3 of Theorem [2] holds, (x-axis: u), y-axis: ip = (fx = tpz). 

As an alternative measure, we suggest to consider for arbitrarily small e > 
the expected time until the algorithm for the first time obtains a search point 
x for which \f(x) — f(x*)\ < e, where f(x*) is the function value of an optimal 
search point, where time is measured in the number of evaluations of the ob- 
jective function. We call this the expected first hitting time (FHT) with respect 
to e. As a first condition, it is desirable to have finite expected FHT for any 
constant e > 0. Informally, this means that the algorithm will eventually find 
a solution of acceptable quality. Secondly, it is desirable that the growth of 
the expected FHT is upper bounded by a polynomial in 1/e and the number 
of dimensions n of the problem. Informally, this means that the algorithm will 
not only find a solution of acceptable quality, but will do so within reasonable 
time. 

Some work has been done in this direction. Sudholt and Witt [13] studied 
the runtime of the Binary PSO, i.e. in a discrete search space. Witt [17] con- 
sidered the Guaranteed Convergence PSO (GCPSO) with one particle on the 
Sphere function, showing that if started in unit distance to the optimum, then 
after 0(nlog(l/e)) iterations, the algorithm has reached the e-ball around the 
optimum with overwhelmingly high probability. The GCPSO avoids stagnation 
by resetting the global best particle to a randomly sampled point around the 
best found position. The behaviour of the one-particle GCPSO therefore resem- 
bles the behaviour of a (1+1) ES, and the velocity term does not come into play. 
In fact, the analysis has some similarities with the analysis by Jagerskiipper [5J. 

The objectives of this paper are three-fold. Firstly, in Section [3] we show 
that the expected first hitting time of a basic PSO is infinite, even on the simple 
Sphere function. Secondly, in Section[3] we point out situations where the basic 
PSO does not converge in mean square to the global best particle (which needs 
not be a global optimum) , despite having parameters in the convergence region. 
We discuss what extra conditions are needed to ensure mean square convergence. 
Finally, in Section [5] we consider a Noisy PSO which we prove to have finite 
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expected FHT on the 1-dimensional Sphere function. Our results also hold 
for any strictly increasing transformation of this function because the PSO is a 
comparison-based algorithm. 



2 Preliminaries 

In the following, we consider minimisation of functions. A basic PSO with swarm 
size m optimising an n-dimensional function / : R™ — > K is defined below. This 
PSO definition is well-accepted, and called Standard PSO by Jiang et al. [8]. 
The position and velocity of particle i G [m] at time t > are represented by 
the pair of vectors X« = (*g,...,X«) and V^ l) = (vg, . . . , V$). The 
parameter a > bounds the initial positions and velocities. 



Algorithm 1 Basic PSO 



for each particle i G [m], and dimension j G [n] do 

X®,V$~Uwf[-a,a] P® = X® 
end for 

G = argmin{/(P (1) ),...,/(P (m) )} 
for t = 0, 1, . . . until termination condition satisfied do 
for each particle i G [m] , and dimension j € [n] do 

W 



u;F« + (pW _ X W) + ( Gt . _ x g) ( (5) 

Xg,,, = Xg + KS,,, where fljj, ~ Unif [0, 1]. (6) 
end for 

P t « = arg min{/(X«), /(P t (l) )} and 
G t+1 =argmin{/(P t (1) ),...,/(P t (m) )}. 

end for 



Assume that a function / : M. n — > K has at least one global minimum x* . 
Then for a given e > 0, the first hitting time (FHT) of the PSO on function / 
is defined as the number of times the function / is evaluated until the swarm 
for the first time contains a particle x for which \f(x) — f(x*)\ < e. We assume 
that the PSO is implemented such that the function / is evaluated no more 
than m times per time step t. As an example function, we consider the Sphere 
problem, which for all x G K" is defined as Sphere(ie) := ||x|| 2 , where || • | 
denotes the Euclidian norm. This is a well-accepted benchmark problem in 
convergence analyses and frequently serves as a starting point for theoretical 
analyses. 
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3 Stagnation 



Particle convergence does not necessarily occur in local optima. There are well- 
known configurations, e.g. with zero velocities, which lead to stagnation |15j . 
However, it is not obvious for which initial configurations and parameter set- 
tings the basic PSO will stagnate outside local optima. Here, it is shown that 
stagnation occurs already with f dimension for a broad range of initial param- 
eters. It follows that the expected first hitting time of the basic PSO can be 
infinite. 

As a first example of stagnation, we consider the basic PSO with swarm size 
one on the Sphere problem. Note that it is helpful to first study the PSO with 
swarm size one before analysing the behaviour of the PSO with larger swarm 
sizes. This is similar to the theory of evolutionary algorithms (EAs), where it is 
common to initiate runtime analyses on the simple (1+1) EA with population 
size one, before proceeding to more complex EAs. 

Proposition 1. The basic PSO with inertia factor lj < 1 and one particle 
(m = 1) has infinite expected FHT on Sphere (n = 1). 

Proof. We say that the bad initialisation event has occurred if the initial position 
and velocity satisfy X > ea and (ea — A )/(l — lj) < Vq < 0. This event occurs 
with positive probability. We claim that if the event occurs, then in any iteration 
t > V t = Vquj*- 1 , and X t = X + V E*=o ^ ■ If thc claim holds > then for a11 
t > 0, it holds that X t < X t -i and G t = X t . Therefore, 

OO 

G t > X + V (J > X + ea - X = ea. 

and the proposition follows. Note that since Gt = X t for each t > 0, the velocity 
reduces to Vt = uVt-i- 

The claim is proved by induction on t. The base case t = 2 clearly holds, 
because V\ — Vquj and X\ = Xq + Vquj. Assume the claim holds for all iterations 
smaller than t. By induction, it holds that Vt = ooVt-i = cj* _1 Vb. Therefore, by 
the induction hypothesis, 

t-2 t-1 

X t = AVx + V t = X + V ^ + V^' 1 =X + V J2 cji - 

i=i i=i 

The claim now holds for all t > 0. The expected FHT conditional on the bad 
initialisation event is therefore infinite. However, the bad initialisation event 
occurs with positive probability, so the unconditional expected FHT is infinite 
by the law of total probability. □ 

We prove that the stagnation on Sphere illustrated in Proposition [T] is not 
an artefact of a trivial swarm size of 1. In thc following theorem, we prove 
stagnation for a swarm of size 2 and think that the ideas can be generalised 
to bigger swarm sizes. We allow any initialisation of thc two particles that 
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are sufficiently far away from the optimum. It is assumed that both velocities 
are non-positive in the initialisation step, which event occurs with constant 
probability for uniformly drawn velocities. 

Theorem 1. Consider the basic PSO with two particles on the one- dimensional 
Sphere. Ifuxl, 1 < <p 2 < 2, F (1) ,F (2) < 0, k < 1 where 



_ <fl - 2ip 2 + 2 + 2wip 2 \J (vl - 2 l2 + 2utp 2 + 2)(y?| + 6y? 2 + 2uy> 2 + 2) 

'• — ~ T" : , 

and X^X^>2e + 2 V ( ^ ~ ^ ' + ^ ' + ^ ' ^ 

aZ/ ZioZd together, then the expected FHT for the e-ball around the optimum is 
infinite. 

The conditions are fulfilled, e.g., if tp 2 — 1.5, lj — 0.07, e = 0.5, Vq = 
V^ 2) = -1, X { 1} = 184, and X^ ] = 185. For a proof, we note that the 
assumed initialisation with positive particle positions, negative velocities and a 
sufficiently large makes the sequences x\ % \ i = 1,2, non-increasing provided no 
negative values are reached. Furthermore, the update equation for the velocities 
will then consist of three random non-positive terms, which means that velocities 
remain negative. In Lemma [2J we focus on the distance D t :— x\ 2 ^ — x[^ 
of the particles and show that its expectation converges absolutely to zero. 
The proof of this lemma makes use of Lemma [T] which gives a closed-form 
solution to a generalisation of the Fibonacci-sequence. In another lemma, we 
consider the absolute velocities over time and show that the series formed by 
these also converges in expectation. The proof of the theorem will be completed 
by applications of Markov's inequality. 

Lemma 1. For any real c > 0, there exists two reals A and B such that the 
difference equation a n = c(a n _! + a„_2),?'i > 1, has the solution a n — a n A + 
(3 n B, where 



c 



y/c{± + c) c+y/cjA- 



a = , and j3 — 

2 z, 

Proof. The proof is by induction over n. The lemma can always be satisfied for 
n = 1 and n = 2 by choosing appropriate A and B. Hence, assume that the 
lemma holds for all i < n for some A and B. Note that 

c 2 - Cv /c(4 + c) + 2c 2 
c(a + 1) = - — = a , and 
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It therefore follows by the induction hypothesis that 

a„ = c(a„_i + a„_ 2 ) 

= c{a n - l A + f3 n - l B + a n - 2 A + (3"- 2 B) 
= ca n - 2 {a + I) A + c/3 n - 2 (f3 + l)B 
= a n A + (3 n B. 

□ 

Lemma 2. Given t > 1, suppose that for all s < t it holds that X^^X^ > 
and VPM 2) < 0- Then E{\D t \) < K*(2|A)| + V^ 1 - V$). 

Proof. The proof is mainly based on an inspection of the update equation of 
PSO. The aim is to obtain a recurrence for E(\D t \), where we have to distinguish 
between two cases. We abbreviate ip = tp2 and S — S^ in the following. 
If X^ < X{ 2 \ then G t = X^' and the update equations are 

v t+l — UJV t 

V%\ = coV t {2) + S<p{xP - X {2) ) = uV t i2) - S<pD t , 

which means A+i = X^-xQ = x[ 2] - X ( t 1] - SipD t + uj{V t (2) - V t (1) ). Since 
V t { l\ = X^i - Xf\ for i = 1,2, we obtain V t (2) - V t {1) =D t - A-i, for t > 0, 
where we define D_i = Dq — + Vq to make the equation apply also for 
t = 0. Together, this gives us 

D t+1 =Dt-SipDt + u(.Dt-D t -i), (7) 

If X[ 1] > X[ 2 \ then the update equations are = LoV t {1) +Sip(X {2) -X ( t 1} ) = 

LoV t (1) +S(pD t and V%\ = ujV t (2) , which again results in ^ and finishes the case 
analysis. Taking absolute values on both sides of Q and applying the triangle 
inequality to the right-hand side, we get 

|A + i| < |(i-5^ + w)||Al + M|A-i|. 

After taking the expectation and noting that ui > 0, we have 

E(\D t+1 \ | A,A-i) < OE(|l-Sy>|)+w)|A|+w|A-i|, 

which implies E(\D t+1 \) < (E(\l - S(f\) +u)E(\ D t \) + u>E(\ A-i|), as the right- 
hand side is linear in both |A-i| an d |A|- We are left with an estimate for 
E(\l — S(f\). By the law of total probability, 

E(\l - Stp\) = E(l - Sip | 1 - Sip > 0) Pr(l - Sp > 0) 
+ E(Sp - 1 | 1 - Sep < 0) Pr(l - Sip < 0) 
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Since S is uniformly distributed and ip > 1, the first conditional expectation is 
1/2, while the second conditional expectation is {<p— l)/2. The probabilities for 
the conditions to occur are 1/ip and 1 — l/<p, respectively. This results in 



2tp \ (p J 2 2cp 
which finally gives us the following recurrence on E(\D t \): 

E{\D t+l \) < ~^ + 2 +UJ ) (E(\D t \) + E(\Dt-i\)). 

Introducing D* t :— E(\D t \) and using A = y ~I y+2 + we have in more 
compact form that 

for t > 0. Solving this recursion (noting that all terms are positive) using 
Lemma [l] yields for t > 1 that 




A + V8X + X 2 \ 1 



D* 



Note that k = A +^ A+A2 < 1 if and only if A < 1. Furthermore, the 
factor in front of has clearly smaller absolute value than n. We obtain 
D* t < «:*(!>*! + -Dq) < K*(2£>o + Vq 1 - F 2 ) which we wanted to show. □ 

The following lemma uses the previous bound on E(\D t \) to show that the 
expected sum of velocities converges absolutely over time. This means that the 
maximum achievable progress is bounded in expectation. As an example, when 
choosing (f2 = 1.5 and u) = 0.07 in the following, we obtain a value of about 
191(| L>o| + |V (1) | + |V (2) |) for this bound. 

Lemma 3. Suppose the prerequisites of Lemma^ apply. Then for i = 1,2 it 
holds that 



oo 



t=0 



E(\v t u\) < J (ia,i + i^ (1) i + i^o (2) d- 



Proof. For notational convenience, we drop the upper index and implicitly show 
the following for both i = 1 and i = 2. According to the update equation of 
PSO, we have \V t +i\ < w\V t \ + <p\D t \ for t > 0, using <p := <p2. Resolving the 
recurrence yields for t > 1 that 

t-i t-1 
|Vi|<«*|V | + V'X) w *l £> *-i--l= w *l^l + V'Z)« t " 1 "*l- D .l- 

s=0 s=0 
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Hence, 

oo 00/ t — 1 



t=l t=l \ s=0 



since < cj < 1 and </?>!. Using the linearity of expectation 



< iVol + ^^dAI) ■ 

t=0 \ t=0 / 

By Lemmaji) £(|A|) < K*(2|A)| + V" (1) - V (2) ). Hence, the series over the 
E{\D t \) converges according to 



00 

X>(|A|)< 1 ;(2|D | 



.(1) t/( 2 ) 

1 — K 

which yields 



^ ' 1 — LU 1 — Lj 1 — K 

t=0 



< 



■(2\D \+2\V^\ + 2\V^\), 



where we have used « < 1. □ 

We are ready to prove Theorem [T] 

Proof of Theorem [7j Throughout this proof, we suppose the prerequisites from 
Lemma [2] to hold, which, as we will show, is true for an infinite number of steps 
with constant probability. 

For any finite t, Lemma [3] and linearity of expectation yield for i — 1,2 that 

2 ^ 2 V/lnJ , IT/(l)l , n/(2)| 



\s=0 



*(ei^i)*( (1 . m 7_j -w+i^i+i^i). 



which by Markov's inequality means that the event 



/ 



s=0 



I>.'"i<.+ fa ) ■(w + w»'i + iv "'i) 



(l-u)(l-«) 



occurs with a positive probability that does not depend on i. Given the assumed 
initial values of Xq \ the e-ball around the optimum is not reached if the event 
occurs. Hence, there is a minimum probability p* such that for any finite number 
of steps t, the probability of not hitting the e-ball within t steps is at least p* . 
Consequently, the expected first hitting time of the e-ball is infinite. □ 
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4 Mean Square Convergence 



As mentioned in the introduction, there exist several convergence analyses using 
different techniques that take into account the stochastic effects of the algorithm. 
The analysis by Jiang et al. [8] is perhaps the one where the proof of mean square 
convergence follows most directly from the definition. They consider the basic 
PSO and prove the following statement (Theorem 5 in their paper): 

Statement 1. Given ui,ipi,(f2 > 0, if < lj < 1, (p% + (f2 > 0, and < 

~-(ip 1 +ip 2 )uj 2 + (lip 2 1 + \ip 2 2 + \ipi(p 2 ) w+Vi+Va-|^-|^-|viVa < y2(1 6 + " } 
are all satisfied, the basic particle swarm system determined by parameter tuple 
{uj,Ci,C2} will converge in mean square to G. 

This statement is claimed to hold for any fitness function and for any initial 
swarm configuration. However, as acknowledged by the corresponding author 
[7], there is an error in the proof of the above statement, which is actually wrong 
without additional assumptions. 

Intuitively Statement [T] makes sense for well-behaved, continuous functions 
like Sphere. However, in retrospect, it is not too difficult to set up artificial 
fitness functions and swarm configurations where the statement is wrong: Let us 
consider the one-dimensional function / : K — > K defined by / (0) = 0, /(l) = 1, 
and f{x) = 2 for all x £ K \ {0, 1}, which is to be minimised. 

Assume a swarm of two particles, where the first one has position 0, which is 
then its local best and the global best. Furthermore, assume velocity for this 
particle, i.e., it has stagnated. Formally, — P^ — Gq — V^p = 0. Now 
let us say the second particle has current and local best position 1 and velocity 

(2) (2) (2) 

0, formally Xf' = ' = 1 and V V 1 = 0. This particle will now be attracted 
by a weighted combination of local and global best, e. g. the point 0.5 if both 
learning rates are the same. The problem is that the particle's local best almost 
surely will never be updated again since the probability of sampling either local 
best or global best is if the sampling distribution is uniform on an interval 
of positive volume or is the sum of two such distributions, as it is defined in 
the basic PSO. The sampling distribution might be deterministic because both 

(i) (i) (i) 

P t — XI and Gt — X^ might be 0, but then then the progress corresponds to 
the last velocity value, which again was either obtained according to the sum of 
two uniform distributions or was already 0. The error in the analysis is hidden in 
the proof of Theorem 4 in [S], where Pr(X t = G) > is concluded even though G 
might be in a null set. Nevertheless, important parts of the preceding analysis 
can be saved and a theorem on convergence can be proved under additional 
assumptions on the fitness function. In the following, we describe the main steps 
in the convergence analysis by Jiang et al. A key idea in [H] is to consider 
a one-dimensional algorithm and an arbitrary particle, assuming that the local 
best for this particle and global best do not change. Then a recurrence relation is 
obtained as follows: X t+ i = (l+uj~(ipiRt+(p2St))X t —ujX t ^i+ipiRtP+(p2StG, 
where we dropped the index denoting the arbitrary particle we have chosen, 
and the time index for local and global best. The authors proceed by deriving 
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sufficient conditions for the sequence of expectations E(X t ), t > 1, to converge 
(Theorem 1 in their paper). 

Lemma 4. Given oj, yi, v?2 > 0, if and only ifO < u> < 1 and < C1+C2 < 4(1 + 
w), the iterative process E(X t ) is guaranteed to converge to (ipiP + 1P2G) / (ipi + 
92)- 

Even though a process converges in expectation, its variance might diverge, 
which intuitively means that it becomes more and more unlikely to observe the 
actual process in the vicinity of the expected value. Another major achievement 
by Jiang et al. [H| is to study the variances Var(A t ) of the still one-dimensional 
process. By a clever analysis of a recurrence of order 3, they obtain the following 
lemma (Theorem 3 in their paper). 

Lemma 5. Given w,cpi,(p2 > 0, if and only if < u> < 1, (fx + cf2 > and 
/(l) > are all satisfied together, iterative process Var(A t ) is guaranteed to 
converge to UtpupiHfpi + <^ 2 )) 2 + (G — P) 2 {1 + w)//(l), where /(l) = -[tp x + 

Lemma [5] means that the variance is proportional to (P — G) 2 . However, 
in contrast to what Jiang et al. [S] would like to achieve in their Theorem 4, 
we do not see how to prove that the variance approaches for every particle. 
Clearly, this happens for the global best particle under the assumption that 
no further improvements of the global best are found. We do not follow this 
approach further since we are interested in PSO variants that converge to a 
local optimum. 



5 Noisy PSO 

The purpose of this section is to consider a variant of the basic PSO that 
includes a noise term. This PSO, which we call the Noisy PSO, is defined 
as in Algorithm [l] except that Eq. ^ is replaced by the velocity equation 
V t % = u,V$ + ^R%(Pg - *g) + ^Sf]{G t>j - Ag) + Ag, where the 

extra noise term A[ l j has uniform distribution on the interval [—8/2, 6/2]. Note 
that our analysis seems to apply also when the uniform distribution is replaced 
by a Gaussian one with the same expectation and variance. The constant pa- 
rameter 5 > controls the noise level in the algorithm. Due to the uniformly 
distributed noise term, it is immediate that the variance of each particle is al- 
ways at least S 2 /12. Therefore, the Noisy PSO does not enjoy the mean square 
convergence property of the basic PSO. In return, the Noisy PSO does not suffer 
the stagnation problems discussed in Section [TJ and finite expected first hitting 
times can in some cases be guaranteed. The noisy PSO uses similar measures to 
avoid stagnation as the GCPSO mentioned in the introduction. However, our 
approach is simpler and treats all particles in the same way. On the other hand, 
the GCPSO relies on a specific update scheme for the global best particle. 
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Our main result considers the simplified case of a one-dimensional function 
but takes into account the whole particle swarm. For simplicity, we only con- 
sider the half-open positive interval by defining Sphere + (ie) := Sphere(x) if 
x > 0, and Sphere + (x) := oo otherwise, which has to be minimised, and as- 
sume that at least one particle is initialised in the positive region. This event 
happens with positive probability for a standardised initialisation scheme. It 
seems that our analyses can be adapted to the standard Sphere (and order- 
preserving transformations thereof), but changes of sign complicate the analysis 
considerably. Note that the analyses of stagnation in Section [3] only consider 
positive particle positions and thus apply to Sphere + as well. 

Theorem 2. Consider the Noisy PSO on the Sphere + function and assume 
Gq > 0. If 6 < e, /(l) > 1/3, and the assumptions from Theorems^ and^ below 
hold, then the expected first hitting time for the interval [0, e] is finite. 

The proof of this theorem relies heavily on the convergence analysis by Jiang 
et al. [Sj. We will adapt their results to the Noisy PSO. Recall that the only 
difference between the two algorithms is the addition of A W in the update equa- 
tion for the particle position. It is important to note that A^' is drawn from 
[—6/2,6/2] (considering one dimension) fully independently for every particle 
and time step. As mentioned above, Jiang et al. [8] consider a one-dimensional 
algorithm and an arbitrary particle, assuming that the local best for this particle 
and global best do not change. Then a recurrence relation is obtained by manip- 
ulating the update equations. Taking this approach for the Noisy PSO yields: 
X t+1 = (l+uj-(<p 1 R t + <p2S t ))X t ~ujX t _ 1 +<p 1 R t P + ip2S t G + A t ,wherewe 
dropped the index for the dimension, the index denoting the arbitrary particle 
we have chosen and the time index for local and global best. This is the same 
recurrence relation as in |8j except for the addition of A t . The authors proceed 
by deriving sufficient conditions for the sequence of expectations E(X t ), t > 1, 
to converge. Since E(A t ) = 0, the recurrence relation for the expectations is 
exactly the same as with the basic PSO and the following theorem can be taken 
over. 

Theorem 3. Given ui,ipi,ip 2 > 0, if and only if < uj < 1 and < (pi + 
if2 < 4(1 +lu), the iterative process E(X t ) is guaranteed to converge to [tp\P + 
<p 2 G)/(<pi + (p 2 ). 

The next step is to study the variances Y&r(X t ) of the one-dimensional 
process. Obviously, modifications of the original analysis in [8] become nec- 
essary here. To account for the addition of A t , we replace Eq. (11) in the 
papeiQby Y t+1 = (ip - R t )Y t - uY t -\ + Q't, where Q' t := Q t + A t and Q t is 
the original Q t from the paper. Regarding the quantities involving Q' t in the 
following, we observe that E{Q' t ) = E(Q t ) = and Var(Q^) = E{{Q' t ) 2 ) = 
E((Q t + A t ){Q t + A t )) = E{Ql + 2A t Q t + Q 2 t ) = E(Q 2 t ) + E(A 2 t ), where we 
used that A t is drawn independently of other random variables. Finally, we get 

1 When referring to the analysis by Jiang et al. [8], Rt does not mean the random factor 
in the cognitive component, but should be understood as defined in their paper. 
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E(RtQ' t ) = E(R t (Q t + A t ))) = E{R t Q t ), which means that all following calcu- 
lations in Section 3.2 in [5] may use the same values for the variables R and T as 
before. Only the variable Q increases by E(A 2 ). Recall that A t ~ U[—6/2, 8/2] 
for constant 5 > 0. We obtain E(A 2 ) = S 2 /12. Now the iteration equation (17) 
for Var(X t ) can be taken over with Q increased by 5 2 /12. The characteristic 
equation (18) remains unchanged and Theorem 2 applies in the same way as 
before. Theorem 3 in [5] is updated in the following way and proved as before, 
except for plugging in the updated value of Q. 

Theorem 4. Given lo, <pi,<pz > 0, if and only if < u < 1, ip% + <~pi > and 
/(l) > are all satisfied together, iterative process VarpQ) is guaranteed to 
converge to (A^i^/^i + ^)) 2 (G - P) 2 (l + u)) + 5 2 /l2)/f(l), where 

/(!) = -{<Pi + V 2 + Q</>2 + u + <Pi + V2 ~ ~<fl ~ ^2 - 

As a consequence from the preceding lemma, the variance remains positive 
even for the particle % that satisfies = G. Under simplifying assumptions, 
we show that this particle allows the system to approach the optimum. Later, 
we will show how to drop the assumption. 

Lemma 6. Assume that f(l) > 1/3, that the global and local bests are never 
updated, and that the conditions in Theorem [3| and Theorem [J] hold. Then for 
all sufficiently small e' > 0, there exists a to > such that 

Vt > t Pr [G - S < X t < G - (5/100 + e'} > 3/100000, 

where X t £ K is the position of the particle in iteration t for which the local best 
position equals the global best position. 

Proof. We assume that G > e, otherwise there is nothing to show. Furthermore, 
G — S cannot be negative since 5 < e. We decompose the process by defining 
Y t = Xt — A(. Our goal is to prove that it is unlikely that Y t is much larger than 
G using Chebyshev's inequality. We therefore need to estimate the expectation 
and the variance of Y t . From Theorem [3] and the fact that P = G holds for the 
best particle, 

lim E (Y t ) = lim E (X t ) - E (A t ) = G. (8) 

To estimate the variance of Y t , first recall that by Theorem [4j it holds that 

lim Var (X t ) = 5 2 /12/(l). (9) 

t— T-OO 

Due to the independence of the random variables Y t and A t , we have Var (X t ) = 
Var (Y t ) + Var (A t ). The random variable A t has variance 5 2 /12. The limit in 
([9| therefore implies that 

lim Var (Y t ) = lim Var (X t ) - Var (A t ) = (10) 

t— >oo t— yoo 
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where we have defined a\ := S 2 (l - /(1))/(12/(1)) < S 2 /6. Combining Eq. 



and Eq. (jioj), yields ^ax t ^oo E (Y t ) + (6/5) ^Var (Y t ) = G + (6/5)oy. This limit 
implies that for any e' > 0, there exists a to > such that Vt > to E (Y t ) + 
(6/5) v /Var(y 4 ) < G + (6/5)cr y + e' < G + 0.4899(5 + e', and analogously £ (Y t ) - 
(6/5) N /Var(y t ) > G+0.48995+e'. By the inequality above, and by Chebyshev's 
inequality, it holds that 



p := Pr [\Y t -G\> 0.48995 + e'] < Pr \Y t - E (Y t ) \ > (6/5) y/V&r (Y t ) < 25/36. 

Obviously, the larger Y t is the more restrictive the requirements on the outcome 
of At are. Hence, choosing t so large that e' < (1 — 0.4899)5 holds, we get the 
desired result 

Pr [G - S < X t < G - 5/100 + e'] > Pr [A t < -5(1/100 + 0.4899)] • (1 - p) 

> 3/100000. □ 

The previous lemma does not make any assumption on the objective func- 
tion. With regard to Sphere" 1 ", it implies that the global best (assuming Go > 0) 
will be improved after some time almost surely. However, since the precondi- 
tion is that the particle has not improved for a while, this is not yet sufficient to 
ensure finite hitting time to an e-ball around the optimum. One might imagine 
that the global best position is constantly updated while its value converges to 
some value greater than 0. 

A closer look into the proofs of Lemma [3] and Lemma [4] and the underlying 
difference equations in [5] reveals that they also apply to every particle i where 
(G( — Pt^) 2 converges to a fixed value. In fact, as we will show in Lemma [7J 
it holds that (G t — P t ) 2 almost surely is a null sequence for every positively 
initialised particle if certain assumptions on the parameters are met. Informally, 
this means that the personal best converges to the global best on Sphere" 1 ", 
which might be considered as a corrected version of the erroneous Statement [T] 
ini. 

Lemma 7. Consider the basic PSO on Sphere + . If /(l) > max{<pf , <P%}(1 + 
uj)/6 and the assumptions from Theorems^ and hold, then (G t — Pt^) 2 is 
a null sequence for every particle i that satisfies Iq > 0. The statement also 
holds for the Noisy PSO if additionally f(l) > 1/3 is assumed. 

Proof. If there is no particle satisfying Pq > 0, nothing is to show. Otherwise, 
we have Go > 0. Pick an arbitrary particle i satisfying Pq^ > and assume 
that (G t - P t (4) ) 2 is not a null sequence for this particle. Because of the special 
properties of one-dimensional Sphere^ and the conditions Go > and P > 
0, the sequences G t Pj are monotone decreasing and bounded, hence they 
are convergent. Assume that (Gt — P^) 2 converges to some non-zero value. 
According to Theorems [3] and |4j E{X t ) and Var(Xt) converge, more precisely 
it holds for the expectation that lim Gt < \imE(X t ) < limP t , where all limits 
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(also in the following) are for t — > oo. In the case of the basic PSO, we obtain 
from Theorems [3] and that ]hnE(X t ) = OpilimP t (i) + ip 2 limG t )/{^i + 92) 
and 

HmVar(^) = \ - ^ 2 (UmG t - limP^) 2 ^. 

If /(l) > </p§(1 + w)/6, we obtain HmVarpQ) < (<pi/0pi + <^ 2 )) 2 (limG f - 
limP t (l) ) 2 = (limG t - limP(X t )) 2 , and if /(l) > ip\{l + w)/6 we obtain 
lim Var(Xt) < (limP^ — limP(X t )) 2 . If both inequalities apply, then the vari- 
ance is smaller than the smallest of the two squared distances, and Chebyshev's 
inequality yields that \X t - limE(X t )\ < min{|limP t (i) - \imE(X t )\, |limP t (l) - 
limP(X t )|}, implying G t < X t < P t , will occur with positive probability for 
sufficiently large t (using the same methods as in the proof of Lemma [6] the 
errors become negligible if t is large enough). This leads to an improvement 
of P t by a positive amount and also (P t — Gt) 2 will decrease by a positive 
amount. Note the lower bound on the size of the positive improvement does 
not change as time increases. As t approaches infinity, the improvement will 
happen almost surely. 

In the case of the Noisy PSO, the argumentation is similar. However, since 
the limit of the variance increases by (<5 2 /12)//(l) according to Theorem [4J 
we will decompose the stochastic process in the same way as in the proof of 
Lemma [6] and combine the calculations that follow from /(l) > 1/3 with the 
considerations presented above for the basic PSO. For the variable Y t , Cheby- 
shev's inequality yields that \Y t - KmE(X t )\ < |limP t (4) - \imE(X t )\ + 0.48995 
and I It - ]imE(X t )\ < |limG t - ]imE(X t )\ + 0.48995 both occur with positive 
probability. Hence, the support of Xt — Y t + A is a superset of a subset of 
[limG t , limP t ^] with positive measure. Therefore, an improvement by a cer- 
tain positive amount has positive probability and will occur almost surely as t 
tends to infinity. □ 

Remark: The preconditions are satisfied for u = 0.4, tpx = ip2 = 1.5, which 
is included in the convergence region of (c.f. Figure [j}. The proof of the 
lemma is in the appendix. We can formulate the announced generalisation of 
Lemma [6] 

Lemma 8. Assume that /(l) > 1/3 and that the conditions in Theorem^ and 
Theorem^ hold. Consider the noisy PSO on Sphere + , pick a particle i satisfy- 
ing Pg^ > and denote G = lim^oo Gt- Then for all sufficiently small e' > 0, 
there exists a t > such that Vt > t , Pr [G* -S<X t <G*-j^ + e'] > 
3/100000, where X t is the position of the considered particle in iteration t. 

Proof. By Lcmma^Gj- P^) 2 converges to for particle i. The stability analy- 
sis of the inhomogeneous difference equations for E (^X^^j and Var (x^J does 

not change and we still get limt_ ) . OCJ E (x^J = G and limt_;. 00 Var (x^^j — 
<5 2 /12/(l) as in Lemma [6] The proof is completed as before. □ 
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We are ready to prove the main result in this section. 

Proof of Theorem^ Since it is monotonically decreasing and bounded, the se- 
quence Gt has a limit G. If G is in the e-ball around the origin, nothing is 
to show. Otherwise, we have G > e and according to Lemma [HJ some point 
X 6 [G — 5, G — 5/100] will be sampled almost surely in finite time. After a 
finite number of such improvements, the e-ball around the optimum will have 
been reached. □ 

6 Conclusions 

Much of the theoretical research on the particle swarm optimiser has focused 
on its convergence properties. In particular, conditions have been found which 
has been claimed to guarantee mean square convergence. We point out an error 
in the proof of this claim, showing that the mean square convergence property 
does not hold for all functions. Still, we think particle convergence is not al- 
ways desirable, in particular when it occurs in non-optimal points in the search 
space. To better understand the PSO as an optimiser, we suggest to put more 
effort in understanding the expected first hitting time (FHT) of the algorithm 
to an arbitrarily small e-ball around the optimum. We point out non-trivial 
configurations where the basic PSO has infinite expected FHT on even simple 
problems like the Sphere function. As a remedy to this undesirable situation, 
we abandon convergence in mean square, and propose the Noisy PSO which 
has non-zero particle variance, but finite expected FHT on the one-dimensional 
Sphere function. 
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